I am a research fellow at National University of Singapore, working with Prof. Mong-Li Lee and Prof. Wynne Hsu at IDS, also with Prof. Tat-Seng Chua at NExT++. Previously, I was an associate researcher at Skywork AI Singapore, working with Prof. Shuicheng Yan (more previously an associate researcher at SEA AI lab). I graduated as Ph.D from Wuhan University.
My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. My papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards when I was in Ph.D stage.
My research interests lie in the NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centering around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):
▶ Multimodal Foundation Models
: Unified multimodal LLMs and generalists.
▶ Capacity
: Perception/generation of modalities/tasks, knowledge acquisition/information extraction.
▶ Cognition
: Cross-modal complex neuro/symbolic reasoning and human-centric affective computing.
I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also potential vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.
Five papers are accepted by AAAI 2025, 1) MLLM Hallucination, 2) Social VQA, 3) Multimodal Meme Understanding, 4) Chain of Multimodal Thought Benchmark and 5) Intent Detection. Congrats to all my co-authors!
• 2 Nov 2024The tutorial video record of Multimodal LLM at ACM MM 2024 are released at Youtube; all slides and materials are available at homepage.
• 25 Oct 2024We will give a tutorial at ACM MM 2024 on Monday 28 Oct 9:00-12:30, on the hot topic of MLLMs: Architecture, Modality, Function, Instruction, Hallucination, Evaluation, Reasoning and Beyond. Please stay tuned to the program and welcome on-site or online attendance.
• 26 Sep 2024Eight papers are accepted by NeurIPS 2024, all about Multimodal LLMs and Learnings. Congrats to all my co-authors!
• 20 Sep 2024Three papers are accepted by EMNLP 2024 (Main/Findings), 1) Commonsense Reasoning, 2) Legal Text Generation, and 3) Survey on Conversational Understanding. Congrats to all my co-authors!
• 16 Sep 2024Ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University.
• 16 July 2024Four papers are accepted by ACM MM 2024, 1) Multimodal Conversational ABSA, 2) Speech Event Extraction, 3) Mutimodal Coreference Resolution and 4) Visual Programs. Congrats to all my co-authors!